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i-S^ ■ Abstract 

-4— » 

This study concerns problems of time-series forecasting under the weakest of as- 
sumptions. Related results are surveyed and are points of departure for the develop- 
f-*) ■ ments here, some of which are new and others are new derivations of previous findings. 
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. The contributions in this study are all negative, showing that various plausible predic- 
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tion problems are unsolvable, or in other cases, are not solvable by predictors which 



' are known to be consistent when mixing conditions hold. 
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1 Introduction 



Given a random variable sequence, such as Xq~^ = {Xq, . . . a typical prediction 

problem is to provide from this data an estimate, say E{Xq~^) of the succeeding value X„. 
Following the influential book Extrapolation, Interpolation, and Smoothing of Stationary 
Time Series by N. Wiener [19], the emphasis in prediction theory has been (and still is) to 
find estimators which are convolutions 

n 

^W)=E«i^n-i (1) 
1=1 

of preceding observations. Here the ctj 's are presumed to be fixed real numbers determined 
entirely by the process covariance function. It is of course well-known that aside from the 
Gaussian process case, linear predictors do not generally give the least-squares optimal 
prediction, or even a consistent approximation (as the data base grows) of the optimal 
estimator, which is the conditional expectation E{Xn\XQ~^) of X„. If the time series 
happens to be generated by the nonlinear autoregression Xn = ^\Xn^i\ + e„ for some 
i.i.d. non-singular noise sequence {e„}, then no matter how the linear parameters in (1) 
are adjusted, the expected squared-error prediction of Xn\{Xi,i < n} will be worse than 
the estimate m{Xn-i) — \J\Xn-i\- 

The Kalman filter and ARMA (or as it is sometimes called, Box/Jenkins) methods 
are equivalent to (1), as are predictors based on spectral analysis. These "second-order" 
techniques were well-suited to the period before about 1970 when data set size and access 
to computer power were relatively limited. 

Beginning with the pioneering work of Roussas [15] and Rosenblatt [14] , nonparametric 
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methods worked their way into the hterature of forecasting for dependent series. Several 
people, including the authors, have investigated forecasting problems, such as enunciated 
by Cover [4] , under the sole hypotheses of stationarity and ergodicity. Two classical results 
for stationary ergodic sequences, namely, Birkhoff 's Theorem, 

1 

lim —^Xi — E{X) almost surely, 

n -^-^ 

and the Glivenko-Cantelli Theorem, 

lim sup \ Fn{x) — F{x)\ =0 almost surely, 

for convergence of the empirical to the true distribution function are clear evidence that 
some statistical problems are solvable under weak assumptions regarding dependency. In 
fact, since nonergodic stationary sequences can be viewed as mixtures of ergodic modes, 
ergodicity itself is not a vital assumption for prediction. This matter is discussed in [10]. 

On the other hand, not all problems solvable for independent sequences can be mastered 
in the general setting. For instance, Gyorfi and Lugosi [8] show that the kernel density 
estimator is not universally consistent, even though we do have consistency of the recursive 
kernel density estimator under ergodicity provided that for some integer mo the conditional 
density of X^o given the condition X^^ exists (Gyorfi and Masry [9]). 

It will be useful to distinguish between two classes of prediction problems. 

Static forecasting. Find an estimator E{XZn) of the value E(Xo\XZn) such that for any 
stationary and ergodic sequence {Xi} with values in some given coordinate set X, almost 
surely, 

\\mE{Xzl) = E{Xo\Xzh). (2) 



In (2), N may be oo, in which case we will speak of the static total-past prediction. Oth- 
erwise, this is called the static autoregression problem. In either case, it is presumed that 
the forecaster E{XZn) depends only on the data segment Xz^- 

The other problem of interest here is. 

Dynamic forecasting. Find an estimator E{Xq~^) of the value £'(X„|X"r]^) such that 
for any stationary and ergodic sequence {X^} taking values in a given set X, almost surely, 

\im \E{X--') - E{X^\X:zh)\ = 0. (3) 

Here is typically either n or a fixed postive integer, and the estimator must be con- 
structible from data collected from time up to the "current" time n — 1. When is 
a fixed postive integer, we have the dynamic autoregression problem, and the alternative 
category will be referred to as the dynamic total-past forecasting problem. 

When the coordinate set X is finite or countably infinite, for both autoregression prob- 
lems (A^ < oo) one may construct an estimator with consistency verified by simple appli- 
cation of the ergodic theorem. Thus, for static autoregression, the observed sequence XZn 
has positive marginal probability. Define for n > N, 

n-N 

Num{Xzh,n) = E ^[x-^-^=x-^] ^-i (4) 

n-N 

Denom{XZN,n) = E tt^=^I^] 
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h{x-i,...,x^N) = P(Xl^ = x:]v). (7) 



Prom the ergodic theorem, a.s., 

1 



n-N 
1 



Num{Xzj„n) ^ g{Xzlf) (8) 



Denom{XZN,n) HXZn) (9) 



n-N 

and this imphes the consistency of the estimate 

E{Xzl) = Num{XZN,n)/Denom{XZN,n), 
i.e., almost surely, as n — > oo. 



^(^-n) ^ = ^^(^ol^rlr)- (10) 



For the dynamic case, take 



J2i=N ^fX^~^ =X"~-^ 



E(xr') = ;.n-i /-" (11) 

2^i=N ^(X^~^ =x""-^ \ 

Since now there are but finitely many possible strings X^^zlf, the ergodic theorem implies 
we have a.s. convergence of the estimator of the successor value on each of them. 

In 1978, Ornstein [12] provided an estimator for the static, finite X total-past prediction 
problem. In 1992, Algoet [1] generalized Ornstein's findings to allow that X can be any 
Pohsh space. More recently, Morvai, Yakowitz and Gyorfi [11] gave a simpler algorithm 
and convergence proof for that problem. It is to be admitted that at this point, these 
algorithms are terribly unwieldy. 



The partitioning estimator is a representative computationally feasible nonparametric 
algorithm. Such methods attracted a great deal of theoretical attention in the 1980's, much 
of it being summarized and referenced in the monograph [7]. This partitioning method, 
and its relatives such as the nearest neighbor and the kernel autoregressions, are known to 
consistently estimate the conditional expectation E{Xo\X_i) under a great many "mixing" 
conditions regarding the degree of dependency of the present and future on the distant 
past cf. Chapter III. in [7]. These mixing conditions, while plausible, are difficult to check. 
There is virtually no literature on inference of mixing conditions and mixing parameters 
from data. 

In view of these positive results under mixing, we wanted to show that the partitioning 
regression estimate, known to be effective for time series under a variety of mixing condi- 
tions, suffices for static autoregressive forecasting, when X is real. Such a finding would 
be interesting because this method is straightforward to apply and in a certain sense, is 
economical with data. This conjecture turns out to be untrue. We will show that there ex- 
ists a partition sequence which satisfies the usual conditions and a stationary ergodic time 
series X„ such that on a set of positive probability, for the partitioning estimate E{XZn), 



This and a related result are demonstrated in Section 3. 

Turning attention to dynamic forecasting, in Section 2, we relate a theorem due to 
Bailey [2] stating that, in contrast to the static case, even for binary sequences, there is 
no algorithm that can achieve a.s. convergence in the sense of (3), for the dynamic total- 
past problem with N — n. On the other hand, it is evident that algorithms such as [1] 
or [11], which provide solution to the a.s. static forecasting problem can be modified to 
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achieve convergence in probability for this recalcitrant case. Details of a conversion were 
given in [10], which gives yet another plan for attaining weak convergence of dynamic 
forecasters. When the coordinate space is finite, it turns out that implicitly, algorithms for 
inferring entropy (e.g., [20]) can also be utilized for constructing weakly convergent static 
and dynamic autoregressive forecasters. This has been noted (e.g., [16]), and discussed at 
length in Section IV of [10]. 

2 Dynamic forecasting 

Let {Xi}'^^ be a stationary ergodic binary- valued process. The goal is to find a predictor 
E{Xq~'^) of the value E{Xn\XQ~'^) such that almost surely, 

\im\EiXr') - E{X^\Xr')\ ^ 

for all stationary and ergodic processes. We show by the statement below that this goal is 
not achieveable. 

Theorem 1 (Bailey [2], Ryabko [16]) For any estimator {E{Xq~^)} there is a sta- 
tionary ergodic binary-valued process {Xj} such that 

P(limsup |^(X«-i) - E{Xr,\X^-')\ > 1/4) > ^. 

n-+oo O 

Remark Bailey's counterexample for dynamic total-past forecasting uses the technique of 
cutting and stacking developed by Ornstein [13] (see also Shields [18]). Bailey's proof has 
not been published and is hard to follow, whereas Ryabko omitted his lengthy proof and only 
sketched an intuitive argument in his paper. These results are not widely known. In view 
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of their significance to the issue of the " hmits of forecasting" , we wanted to unambigously 
enter it into the easily- accessible literature. 

Proof The present proof is a simplification of the clever counterexample of Ryabko [16]. 
First we define a Markov process which serves as the technical tool for construction of 
our counterexample. Let the state space S be the non-negative integers. Prom state 
the process certainly passes to state 1 and then to state 2, at the following epoch. Prom 
each state s > 2, the Markov chain passes either to state or to state s -|- 1 with equal 
probabilities 0.5. This construction yields a stationary and ergodic Markov process {Mi} 
with stationary distribution 

P{Mi = 0) = P{Mi = 1) = ^ 

and 

P(M,=j) = ^ fori>2. 
Let Tk denote the first positive time of occurence of state 2k : 

Tk = min{i > : = 2k}. 

Note that if Mq — then Mj < 2k for < i < r^. Now we define the hidden Markov 
chain {Xi}, which we denote as, Xi — f{Mi). It will serve as the stationary unpredictable 
time series. We will use the notation Mq to denote the sequence of states Mq, . . . , M^. 
Let /(O) = 0, /(I) = 0, and /(s) = 1 for all even states s. A feature of this definition of 
/(•) is that whenever Xn = 0, = 0,Xn+2 = 1 we know that Mn — and vice versa. 
Next we will define f{s) for odd states s maliciously. We define f{2k + 1) inductively for 
A; > 1. Assume f{2l + 1) is defined for I < k. If Mq = (that is, /(Mq) = 0, /(Mi) = 0, 
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/(M2) = 1) then Mj < 2A; for < i < Tk and the mapping 

Mo^'=^(/(Mo),...,/(M.J) 

is invertible. ( Given Xq find 1 <l <n, and positive integers Q — Tq < ri < . . . < ri — n-\-l 
such that Xq" = X;2-i, . . . , X;;"^), where 2 < r^+i - 1 - r, < 2A; for < i < / - 1, 

n - 1 - 2k and for < i < /, = (/(O), /(I), . . . , /(n+i - 1 -r,)). Now = n 

and M^^+i~-^ = (0, 1, ... , rj+i — 1 — rj) for < i < Z. This construction is always possible 
under our postulates that Mq = and Tk — n.) Let 

S+ = {Mo = 0, i(/(Mo), . . . , f(Mj) > \} 



and 



= {Mo = 0, E{f{M,), f{M,,)) < i}. 



f {2k +1)^1 



Now notice that the events and 5^ do not depend on the future values of /(2r + 1) 
ior r > k, and one of these events must have probability at least 1/8 since 

P(S+) + P{B^) ^ P{Mo ^0)^\. 
Let Ik denote the most likely of the events B^ and Bj^ , and inductively define 

if 7fe = B+. 
Because of the construction of {Mi}, on event Ik, 

E{X,^^,\X^,-) = f{2k+l)P{X,,^, = f {2k + 1)1X^0") 
= f{2k + l)P(M,,+i ^2k + 1\M^>') 
= 0.5/(2A; + l). 

8 



The conditional expectation E{Xt^^i\Xq'') and the estimate E{Xq'^) differ at least 1/4 on 
the event Ik and this event occurs with probability at least 1/8. By Fatou's lemma, 

P{\imsup{\E{X^-') - E{X^\XS-')\ > 1/4}) 

n— >oo 

> P(limsup{|^(Xo"-i) - E{X^\X^-')\ > l/4,Xo = = 0,^2 = 1}) 

n^oo 

> P(limsup{|£^(/(Mo), . . . , /(M,J) - E{f{Mr,+i)\f{Mo), f{M,,))\ > 1/4, Mo - 0}) 

> P(limsup/fc) = £'(limsup l(/fc)) > limsup £'l(/fc) = limsupP(/fc) > 

fc— »oo fe— ♦oo fc— »oo fe— ♦oo O 

□ 

We noted in the Introduction that there are static total-past empirical forecasters (i.e., 
iV = oo in (2)) which are strongly universally consistent when the coordinate space X is 
real. These are readily transcribed to weakly-consistent dynamic forecasters. The following 
(which was inspired by the methods of [16]) shows that one cannot hope for a strongly 
consistent autoregressive dynamic forecaster. 

Let {Xi}'^^ be a stationary ergodic real- valued process. The goal is to find a one-step 
predictor E{Xq~^) of the value E{Xn\Xn~i) (i.e. = 1) such that almost surely, 

\im\E{XS-') - E{X^\X^^^)\ = 

for all stationary and ergodic processes. 

Theorem 2 (Ryabko [16]) For any estimator {E{Xq~^)} there is a stationary ergodic 
process {Xj} with values from a countable subset of the real numbers such that 

P(limsup{|i(Xo"-^) - EiXn\Xn-i)\ > 1/8}) > ^. 

n— »oo o 
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Proof We will use the Markov process {Mj} defined in the proof of Theorem 1. Note that 
one must pass through state s to get to any state s' > s from 0. We construct a hidden 
Markov chain {Xi} which is in fact just a relabeled version of {Mi}. This construct uses 
a different (invertible) function /(■), for Xi = f{Mi). Define f(0)=0, /(s) = L, + 2"" if 
s > where Lg is either or 1 as specified later. In this way, knowing Xi is equivalent 
to knowing Mi and vice versa. Thus Xi — f{Mi) where / is one-to-one. For s > 2 the 
conditional expectation is, 

E(Xt\X,^, = + 2-) = . 

We complete the description of the function /(•) and thus the conditional expectation by 
defining Lg+i so as to confound any proposed predictor E{Xq~^). Let Tg denote the time 
of first occurence of state s : 

Tg — min{i > : Mj = s} 
Let Li = L2 = 0. Suppose s > 2. Assume we specified Lj for i < s. Define 

and 

One of the two events must have probability at least 1/8. Take L^+i = 1, and Ig — B~ 
if P{B;) > P{B+). Let L,+i = 0, and Ig = B+ if P{Bj) < P{B+). The difference of 
the estimate and the conditional expectation is at least 1/8 on the event Ig and this event 
occurs with probability not less than 1/8. By Fatou's lemma, 

P(limsup{|iW-^) - E{X^\X^_^)\ > h) 

n^oo o 
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> P(limsup{|^(Xo-^) - E{X,^+^\X,J\ > J,Xo = 0}) 

s^oo O 

> P(limsup/s) > limsupP(/s) > -. 

s—*oo s— >oo O 

□ 

Remark 1. The counterexample in Theorem 2 is a Markov chain with countable number 
of states. (The correspondence between states s and labels f{s) is one-to-one.) 

Remark 2. One of the referees noted that the question of whether strongly consistent 
forecasters exist if the process is postulated to be Gaussian, is interesting and open. 



3 Partitioning estimates which are not universally 
consistent for autoregressive static forecasting 

Let {{Yi, Zj)}'^^ be a stationary sequence taking values from TZ xTZ. Let Vn = {^n,j} be 
a partition of the real hne. Let An{z) denote the cell A^^j of Vn into which z falls. Let 

-1 n— 1 

^"(^) = ;r3T E hz-.eAiY^i (13) 
^ i=i 

and 

-| n— 1 

^^n{A) = --—Y^^z_,eAy (14) 

i=i 

Then the partitioning estimate of the regression function E{Yq\Zq — z) is defined as follows: 

Vn{An{z)) _ TJl=l I[Z_i&Ar,{z)]y-i 

^ ,, (A (^\\ ~ Y-n-1 T ■ 
IXnKAn{Z)) i[Z_ieA„{z)] 

We follow the convention that 0/0 = 0. 
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If {{Yi, Zi)} is i.i.d. or uniform mixing or strong mixing with certain assumptions on the 
rates of the mixing parameters, then the strong universal consistency of the partitioning 
estimate has been demonstrated under the proviso that for all intervals S symmetric around 
0, 



and 



lim sup dia'm{Anj) — (16) 



HmMA,^M^ = (17) 

n->-oo n 



(cf. Devroye and Gyorfi [5] and Gyorfi [6], for the i.i.d. case, and Chapter III. in [7] for 
mixing and for cubic partitions). 

In the discussion to follow, we investigate the problem of one-step (i.e. N — 1) 
autoregressive static forecasting by the partitioning estimate for the case of a station- 
ary and ergodic real-valued process {Xj}^^. Thus the intention is to infer the value 
m{x) = E{Xo\X-i = x). In this case the partitioning estimate is adapted for autoregres- 
sive prediction. The predictor m„(a;) is here defined to be the partitioning estimate 7fin{z) 
in (15) with z — x ior the process {Yi — Xi,Zi — That is, 

_ Vn{An{x)) _ Y.7=l hX-x-r&A„{x)\X^i 
lln{An{X)) l^i^^ i[X_i_i&Ar,{x)] 

In an obvious way, the partitioning estimate results in a one-step static forecasting: E[Xz}^ - 
rhniX^i). 

In contrast to the success of the partitioning estimate for independent or mixing se- 
quences, we have the following negative results. 

Theorem 3 There is a stationary ergodic process {X^} with marginal distribution uni- 
form on [0,1) and a sequence of partitions Vn satisfying (16) and (17) such that for the 
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partitioning forecaster 7hn{X_i), defined by (18), 

P(limsup|m„(X_i) -m{X_i)\ > 0.5) > 0.5. 

n— »oo 

Proof We will construct a sequence of subsets Bn of [0, 1), such that 

P(X_i e limsupPn) > 

n— »oo 

and if X_i G Bn then X_2 ^ Bn, . . . , X_n ^ Bn- Thus, when X_i G Bn, we will be assured 
that none of the data values up to time n are in this set, and consequently a conventional 
partitioning estimate has no data in the appropriate partition cell. We present first a 
dynamical system. We will define a transformation T on the unit interval. Consider the 
binary expansion of each real-number r G [0, 1), that is, r = Z)^irj2^*. When there 
are two expansions, use the representation which contains finitely many I's. Now let 

T(r) = min{i > : = 1}. (19) 

Notice that, aside from the exceptional set {0}, which has Lebesgue measure zero r is finite 
and well-defined on the closed unit interval. The transformation is defined by 



1 if < i < T(r) 

if i = T(r) (20) 
r,- if i > rir). 



Notice that in fact, Tr ^ r - 2'^^''^ + E[=i ^ 2"'. All iterations T'' of T for -oo < A; < oo 
are well defined and invertible with the exeption of the set of dyadic rationals which has 
Lebesgue measure zero. In the future we will neglect this set. One of the referees pointed 
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out that transformation T could be defined recursively as 



r - 0.5 if 0.5 < r < 1 
if0<r<0.5. 



Let Si = {/q, . . . , /2i_]^} be a partition of [0,1) where for each integer j in the range 
< J < 2* Ij is defined as the set of numbers r = J2^i '"i;2~^ whose binary expansion 
O.ri, r2, . . . starts with the bit sequence ji,j2, ■ ■ ■ ,ji that is reversing the binary expansion 
jh ■ ■ ■ ) J2, Ji of the number j — J^l^i 2'~^j/. Observe that in there are 2' left-semiclosed 
intervals and each interval /j has length (Lebesgue measure) 2~*. Now /j is mapped linearly, 
under T onto for j = 1, . . . , 2* — 1. To confirm this, observe that for j = 1, . . . , 2' — 1, 



if r e Pj then 



T(r)-1 oo 

Tr - E 2-'+ E 

1=1 l=T{r)+l 



= r-E2-'U-(j-l)0 
1=1 

i oo 

= E(J - 1)^2-' + E ri2-K 
1=1 1=1+1 

Now if < r e Jq then r(r) > z and so Tr e /a^-i- Furthermore, if r e I^i-i then 
ri = . . . = Tj = 1, and thus conclude that (T~^r)i = . . . = (T~^r)j = 0, that is, T~^r e /q. 
Let r e [0, 1) and n > 1 be arbitrary. Then r e Ij for some < j < 2" — 1. For all 
i - (2" - 1) < A; < j, 

n oo 

= E(J - fc);2-' + E ^/2-'. (21) 

i=l /=n+l 

Now since T^^/j = for i > 1, j = 0, . . . , 2* — 2, and the union over i and j of these sets 
generate the Borel cr-algebra, we conclude that T is measurable. Similar reasoning shows 
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that is also measurable. The dynamical system {fl, T ^ /x, T) is identified with O = [0, 1) 
and T the Borel cr-algebra on [0, 1), T being the transformation developed above. Take 
to be Lebesgue measure on the unit interval. Since transformation T is measure-preserving 
on each set in the collection {Jj : 1 < j < 2* — 1, 1 < ? < oo} and these intervals generate 
the Borel cr-algebra T is a stationary transformation. Now we prove that transformation 
T is ergodic as well. Assume TA — A. If r e A then TV e ^4 for — oo < I < oo. Let 
Rn : [0, 1) — > {0, 1} be the function Rj^r) = Vn- If r is chosen uniformly on [0, 1) then 

i?2, ... is a series if i.i.d. random variables. Let Tn = cr{Rn-i Rn+ii ■ ■ ■)■ By (21) it is 
immediate that A e H^^-^Tn and so is a tail event. By Kolmogorov's zero one law ^{A) 
is either zero or one. Hence T is ergodic. 

Next we construct the sequence {-Bn}, described at the beginning of this proof, which 
forces the partitioning method to make "no data" estimations infinitely often. For each Bn 
we require that 

T°5„, . . . , T-'^Bn be disjoint. (22) 

The definition is inductive onk> 1. For A; = 1, we define Bi — II, that is Bi is taken to be 
the left half of the unit interval. Since T~^ll — condition (22) is satisfied. Recursively, 
for /c = 2, 3, ... we define Bi for 2*^"^ < / < 2^~^. Suppose that by the end of the construct 
for A; — 1 we have defined for 1 < Z < 2^~^ so that condition (22) is satisfied with n — I. 
For the next iteration, k, we define B2k-2^i for 1 < Z < 2^^"^ by 

B2k-2^l — l2k-l_2l 

and since 

rp—m D fk 

1 -D2'=-2+« — -'2fe-i-2i+m 
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for < m < 2^^ + I, condition (22) is satisfied. Take Ck to be the union of the newly 
defined B'lS: 

Ck^ U S, = {r = 0.ri,...:ri = 0,rfc = 0}. 

Now 

/J, (hm sup Bn) = IJ> (hm sup C„) 

n—*oo n— >oo 

= li{{r e [0, 1) : ri — 0,rn — for infinitely many n}) 
= 0.5 

since the set of real numbers in [0, 0.5) having infinitely many zero bits in their expansion 
constitute a set of Lebesgue measure 0.5. Define the process as follows: For uj randomly 
chosen from [0, 1) according to Lebesgue measure /i, the dynamical system construct has 
us take, Xi{uj) — T^'^^cv. Notice that the time series {X,i}°°^ is not just stationary and 
ergodic but also Markovian with continuous state space. Notice also that any observation 
Xi determines the entire future and past. By (22) if a; e S„ then X_i(a;) e and 
X^i{u;) ^ Bji ior all 1 < i < n. We will construct a partitioning estimator which satisfies 
the conditions of the definition given above and yet which is ineffective for this process. 
Take {Hnj}j=i to be a partition of [0, 1) by intervals of length /i„ = l/q{n) such that 

K^Q (23) 

and 

nhn — > oo. (24) 

Let Al^j — HnjHBn and A^j — HnjHBn, the overbar denoting complementation. Choose 
Vn = {^n,jj ^n,j '■ J = 1) • • • ) ■ Partition Vn satisfies the conditions (16) and (17). If 
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LU & Bn then for some 1 < j < q{n), X_i(a;) G A^j and ^ ^4^^ for all 1 < i < n. 

The left half Bi = of [0, 1) is mapped to the right half TBi = ll and Bn Q Bi, so 
£;(Xo|X_i)(a;) > 0.5 if u; e B^. On the other hand, m„(X_i)(a;) = if a; e Thus 



Theorem 4 For the partitioning estimate mn{x), defined by (18), there is a stationary er- 
godic process {Xi} with marginal distribution uniform on [0, 1) and a sequence of partitions 
Vn satisfying (16) and (17) such that for large n, 



Proof The proof is a slight extension of the Shields' construction where he proved the 
non-consistency of the histogram density estimate from ergodic observations (cf. p. 60. 
in [7]). The dynamical system (Q,^, /x, T) is determined hy fl — [0,1), the Borel a- 
algebra, /i the Lebesgue measure on [0, 1), and Tuj — cu + a mod 1 for some irrational a. 
The dynamical system {Q, J^, fj,,T) is stationary and ergodic by [3]. Let Xi{uj) = T'+^o;. 
We will apply Rohlin's lemma (cf. [17]), according to which if {fl, iJ,,T) is a nonatomic 
stationary and ergodic dynamical system then given e > 0, and positive integer N, there 
exists a set e ^ such that 



P(limsup|m„(X_i) - m(X_i)| > 0.5) > /x(limsupSn) = 0.5. 




□ 




1 
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are disjoint and 



/.(uiIo'T-5) > 1 - 6. 
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For N = An and e = 0.5 we are assured of the existence of a set 5 e such that 

^ji{l}ll-^T-'S) > 0.5. 

Put 

and 

C„ = \JlZ-^^T-'S. 

Since T~'^S i = 0, . . . , 4n — 1 are disjoint and T is measure preserving, we have fx{Bn) > 1/8 
and 1/4 < fi{Cn) < 1/2. Let ^^(a;) = T'+^a;. The definitions of Bn and imply that all 
of T~^Bn C Cn for i = 0, . . . , n — 1 and thus on the event B^ all of the random variables 
. . . , X_n are in C„, thus ^ I[x.i&Cn\ — 1- Now let {i^n j}^=i be a partition of the 
unit interval by intervals of length hn = llq{n) satisfying (23) and (24). Let A^^- = Hn^nCn 
and A~j = Hnj n Cn- Now let Vn = {^n,j^ ^n,j '■ 3 = ^i ■ ■ ■ iQ.{p)}- It is immediate that 
Vn satisfies conditions (16) and (17). 

j |rh„(x) — m{x)\iJ,{dx) 
= £ L+ I ^,f(^A+ \ ~ ^Ax)\^^{dx) + i \ ^,'!^(^A- \ -m{x)\^{dx) 

9W . Z/„(A" •) 

^ 13 A- l „"/j-'\ -m{x)\n{dx) 

j=l t^n\^n,j) 

> T.Wn{A-,)^\p\- f m{xMdx)\. (25) 

On the event Bn, iJ^niCn) = and consequently /ini^nj) = ^ni^nj) = 0- Therefore on the 
event B^, 

j \mn{x) — m{x)\n{dx) 
18 



— ^2 J m{x)ijL{dx) 



j = l •'^nj 

q{n) 



> II/^(^n,i)jnf ((a^ + a) mod 1). 



3=1 

For 1 < J < q{n) let g^iJ) = inf^g/^^^,((x + a) mod 1) and r„(j) = min{/ > 1 : gn{j) < 
Ihn}- Notice that function : {1, . . . , q{n)} — > {1, . . . , q{n)} is onto and invertible. Since 
/^(C'n) > 0.5, on the event 



j \rhn{x) — m{x)\iJi{dx) 



q{n) 

> E/^(^nj)^n(j) 

q{n) 

1=1 

[0.5/h„i 

1=1 

> o.5(M^(^-i)(^-2) 

> 5^ (26) 

if /i„ < 1/12. Since /i„ — > 0, for large n, on the event B^, the Li error is at least 1/16. 
That is, for large n, 

P{ J \mn{x) - m{x)\ii{dx) > 1/16) > ii{Bn) > ^. 
The proof of Theorem 4 is complete. □ 
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Remark 3 Let process {Xn} and the sequence of partitions {Vn} be as in Theorem 4. Set 
Zn — Xn-1 and Yn — Xn + {1 — a) mod 1. Define m{z) — E{Yo\Zo — z). It is easy to see 
that m{z) — z. Define rhn{z) as in (15) with partition Vn- The proof of Theorem 4 shows 
that the sequence of partitions {Vn} satisfies conditions (16) and (17) and 
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